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1. Introduction 

In [20] an investigation of the dynamic properties of computing ma- 
chines using a general lossless compression approach led to reason- 
able classifications of elementary Cellular Automata (CA) and other 
systems, classifications corresponding to Wolfram's four classes of be- 
haviour [18]. In the spirit of other analytical concepts for scale pre- 
dictability (for example, Lyapunov exponents), but employing different 
means, this compression-based method also led to the definition of a 
phase transition coefficient as a way of detecting a system's (in)stability 
vis-a-vis its initial conditions and of measuring its dynamic ability to 
carry information. A conjecture relating the magnitude of this coeffi- 
cient and the capability and efficiency with which a system performs 
universal computation was introduced. In this paper the conjecture is 
developed further, with some additional arguments. 

In [21], a related conjecture concerning other kinds of simply defined 
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programs was presented, establishing that all Busy Beaver Turing ma- 
chines may be capable of universal computation, as they seem to share 
some of the informational and complex properties of systems capable of 
universal computational behaviour. The conjecture will be regarded in 
light of algorithmic complexity, particularly of Bennett's logical depth 
[1], and will be reconnected to the first conjecture via the dynami- 
cal properties of these machines through the compression-based phase 
transition coefficient. 

Some definitions of concepts to be discussed either as foundations of 
these possible new connections, or as evidence for making such claims 
will be introduced first. The investigation is meant to be an exploration 
of empirical observations through quantitative measures which attempt 
to capture qualitative properties of the dynamic behaviour of systems 
capable of computational universality. 

I 1.1 Preliminaries 

Proof-of-universality results for simple programs have traditionally re- 
lied on localized structures (or "particles"), as distinguished from rel- 
atively uniform regions. This means that a measure of entropy of a 
system will tend to be below its theoretical maximum. At the same 
time, however, this "particle-like" behaviour is, and must in principle 
be unpredictable for the system to reach computational universality. 

Stephen Wolfram has classified all the one-dimensional nearest neigh- 
borhood CA into four classes [18]: (i) Class 1: ordered behaviour; (ii) 
Class 2: periodic behaviour; (iii) Class 3: random or chaotic behaviour; 
(iv) Class 4: complex behaviour. The first two are totally predictable. 
Random CA are unpredictable. Somewhere in between, in the transi- 
tion from periodic to chaotic, complex, interesting behaviour can occur. 

One of Wolfram's open problems [17] in cellular automata, for ex- 
ample, is the question of the computational universality of a system 
belonging to Class 3 (random-looking, such as rule 30) for which an 
entropy measure remains near its maximum at every time step, and 
which is unlikely to show any "particle-like" behaviour. The question 
is whether such a "hot system" can carry information and be pro- 
grammed. The techniques to prove such a system universal may require 
methods different from those hitherto used for systems in which struc- 
tures can be distinguished and which can therefore be made to carry 
information through them. The common belief is that these kinds of 
systems may be powerful enough but are just too complicated, perhaps 
even impossible to program. The encoding required to deal with the 
sophistication of a class III rule cellular automaton would itself prob- 
ably have to possess the sophistication of a computationally universal 
system. This brings us to Wolfram's PCE, which states that almost all 
processes that are not obviously simple can be viewed as computations 
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of equivalent sophistication ([18], pp. 5 and 716-717). 
| 1.2 The behaviour of simple programs 

In 1970, Conway invented an automaton, which was popularised by 
Gardner [9] and was known as the Game of Life. It was proved that 
Life was capable of universal computation [3] . The proof of universality 
uses what in the jargon of CA are known as gliders, glider guns, and 
eaters, that is, structures to carry and manipulate information through 
the system (by combining such emergent propagating structures one 
can simulate logic gates and circuits). 

Langton's ant [11] is a two-dimensional Turing machine with 2 sym- 
bols and 4 states following a set of very simple rules 1 . In [8] , a very 
simple construction is presented which proves that Langton's ant is 
also capable of universal computation. 

But an exhaustive exploration of one-dimensional elementary CA 
(that by most standards would be considered the simplest possible 
CA) unlike any previous system that has been constructed, was under- 
taken in [18]. The rule with number 110 (and equivalent rules: 124, 
137 and 193) in Wolfram's numbering scheme, presenting the charac- 
teristic "particle-like" structures, turned out to be capable of universal 
computation [18, 5]. Rule 110 can be set up with initial configurations 
that have signals transmitted in the form of collisions of "particle-like" 
dynamical structures, simulating a variant of a tag system, another 
rewriting system capable of universal computation. 

The proofs of universality of all these systems imply that their dy- 
namics are unpredictable. The notion of universality implies the ex- 
istence of undecidable problems related to most questions concerning 
these machines. Questions related to these simple dynamical systems 
cannot therefore be algorithmically answered. From which it follows 
that undecidability is a measure of the unpredictability of a system 
associated with its dynamical behaviour. 

| 1.3 Quantitative measures of qualitative behaviour 

Definition 1 [10, 6, 12]. Ku(s) = min{|p|, U(p) = s} where \p\ is 
the length of p measured in bits. 

A measure of complexity is derived by combining the algorithmic 
complexity describing a system and the time it takes to produce a 
string. Bennett's concept of Logical Depth [1, 2] is a complexity mea- 



1 (a) If the machine head is on a black square, it turns 90 degrees right and moves 
forward one unit, (b) If the head is on a white square, it turns 90 degrees left and 
moves forward one unit, (c) When the head leaves a square, it prints the opposite 
colour. 
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sure capturing the structure of a string defined by the time that a Tur- 
ing machine takes to reproduce the said string from its (near) shortest 
description. Formally, 

Definition 2. A string's logical depth D is given by D(s)=min{t(p) : 

(\p\ < \ Pt \)vu(p) = s y 2 

According to this measure, the longer it takes, the more complex 
the string. Complex objects are therefore those which can be seen as 
"containing internal evidence of a nontrivial causal history." 

| 2. Compression-based phase transition coefficient 

A measure based on the change of the asymptotic direction of the size 
of the compressed evolutions of a system for different initial configu- 
rations (following a proposed Gray-code enumeration of initial config- 
urations) was presented in [20]. It gauged the resiliency or sensitivity 
of a system vis-a-vis its initial conditions. This phase transition coeffi- 
cient led to an interesting characterisation and classification of systems, 
which when applied to elementary CA, yielded exactly Wolfram's four 
classes of systems behaviour, with no human intervention. The co- 
efficient works by compressing the changes of the different evolutions 
through time, normalised by evolution space, and it is rooted in the 
concept of algorithmic complexity, being an upper bound of the algo- 
rithmic complexity of a string. The more compressed a string, the less 
algorithmically complex. 

Let the characteristic exponent be defined as the mean of the ab- 
solute values of the differences of the compressed lengths of the outputs 
of the system M running over the initial segment of initial conditions 
ij with j = {1, . . . , n} following the numbering scheme devised in [20] 
based on a Gray-code optimal enumeration scheme, running for t steps 
in intervals of n. Formally, 

Definition 3. c* = |C(|Af t (*i)) - C(\M t (i 2 ))\ + . . . + |C(|M t (i B _i)) - 
C{\M t {Q)\/t{n-l). 

Definition 4. Let C denote the transition coefficient of a system U de- 
fined as C(U) = f'(Sc), the derivative of the line that fits the sequence 
S c by finding the least-squares as described in [20] with S c = S(c?) for 
a fixed n and t. 

The value C(U), based on the phase transition coefficient, is a stable 



2 Bennett provides a careful elaboration [1] of the notion of logical depth taking 
into account near-shortest programs as well as the shortest ones. 
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indicator of the degree of the qualitative dynamical change of a system 
U. The larger the derivative, the greater the change. According to C , 
rule numbers such as and 30 appear close to each other both because 
they remain the same despite the change of initial conditions, and 
because their evolution cannot be perturbed. The measure indicates 
that rules like rule or rule 30 are also incapable of or inefficient at 
transmitting any information, given that they do not react to changes 
in the input of the system. Odd as it may seem, this is because there is 
no change in the qualitative behaviour of these CA when feeding them 
with different inputs, regardless of how different the inputs may be-rule 
remains entirely blank while 30 remains mostly random-looking, with 
no apparent emergent coherent propagating structures (other than the 
regular and linear pattern on one of the sides). 

On the other hand, rules such as rule 122 and rule 89 appear next 
to each other as the most sensitive to initial conditions, because as the 
investigation proves, they are both highly sensitive to initial conditions 
and present phase transitions which dramatically change their qualita- 
tive behaviour when starting from one or another initial configuration. 
This means that rules 122 and 89 can be more successfully used to 
transmit information from the input to the output. 

| 2.1 Connecting dynamic behaviour and Turing universality 

Evidently if a system is completely predictable and therefore dynami- 
cally trivial, it is decidablc, and therefore not Turing universal. Rule 
110 should therefore not be very predictable according to the phase 
transition measure, but at the same time we can expect it to be ver- 
satile enough to produce the variety needed to behave as a universal. 
Rule 110 is one rule about which my own phase transition classification 
says that, despite showing some sensitivity, it also shows some stability. 
Which means that one can say with some degree of certainty how it will 
look (and behave) for certain steps and certain initial configurations, 
unlike those at the top. 

This is acknowledged by Wolfram himself when discussing 
rule 54 ( [18] page 697): 'It could be that if one went just a 
little further in looking at initial conditions one would see more 
complicated behaviour. And it could be that even the structures 
shown above can be combined to produce all the richness that is 
needed for universality. But it could also be that whatever one 
does rule 54 will always in the end just show purely repetitive or 
nested behaviour-which cannot on its own support universality." 

For every CA rule, there is a definite (often undecidable) answer 
to the question whether or not it is capable of universal computation 
(or in reachability terms, whether a CA will evolve into a certain con- 
figuration). The question only makes sense if the evolution of a CA 
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depends on its initial configuration. No rule can be universal that fixes 
the initial configuration once and for all (there would be no way to 
input an instruction and carry out an arbitrary computation). 

An obvious feature of universal systems is that they need to be capa- 
ble of carrying information by reflecting changes made to the input and 
transmitted to the output. In attempting to determine whether a sys- 
tem is capable of reaching universal computation, one may ask whether 
a system is capable of some minimal versatility in the first place, and 
how efficiently it can transmit information. And this is what the phase 
transition measures-it indicates how well a system manages to respond 
to an input. Obviously, a system such as rule or rule 255, which does 
not change regardless of the input, is trivially decidable. But a univer- 
sal system should be capable of reaction to external manipulation (the 
input to the system) in order to behave as a universal system, that 
is, to be capable of simulating and reaching the output of any other 
universal system. 

Conjecture 1: Let U be a machine capable of (efficient) universal 
behaviour. Then C(U) > 0. 

Conjecture 1 is one-way only, meaning that it states that an efficient 
universal system should be equipped with these dynamical properties, 
but the converse does not necessarily hold, given that having a large 
transition coefficient by no means implies that the system will behave 
with the freedom required for Turing universality (a case in point is 
rule 22, which, despite having the largest transition coefficient, seems 
restricted to a small number of possible evolutions). 

| 2.2 Evidence and discussion of a qualitative characterisation 

The conjecture is based on the following observations: 

1. The phase transition coefficient provides information on the ability of 
a system to react to external stimuli. 

2. Universal systems are (efficient) information processors capable of car- 
rying and transmitting information. 

3. Trivial systems and random-looking systems are incapable of trans- 
mitting information. 

4. Trivial systems have negative C values, close to zero. 

5. Rules such as 110, proven to be universal, and rule 54 (suspected to 
be universal, see [18] page 697) turn out to be classified next to each 
other, with a positive transition coefficient. 

The capacity for universal behaviour implies that a system is capable 
of being programmed and is therefore reactive to external input. It is 
no surprise that universal systems should be capable of responding 
to their input and doing so succinctly, if the systems in question are 
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efficient universal systems. If the system is incapable of reacting to 
any input or if the output is predictable (dccidable) for any input, the 
system cannot be universal. 

Values for the subclass of CA referred to as elementary (the sim- 
plest one-dimensional) have been calculated and published in [20] . We 
will refrain from evaluations of C to avoid distracting the reader with 
numerical approximations that may detract from our larger goal. The 
aim is to propose some basics of a behavioural characterisation of com- 
putational universality. 





■■■ 

□ 


□ 


■ ■ 1 


en 

□ 


□ 


■ 
■ 


■ 

□ t 


1 1 

: 





































































Figure 1. EC A rule 4 is a kind of program filter that only transfers bits in 
isolation (i.e. when its neighbors are both white). It is clear that one can 
perform some very limited computations with this automaton. 

For example, some rules, such as rule 0, don't produce different con- 
figurations relative to variant initial configurations. No matter how one 
changes the initial condition, there is no way to make it produce other 
than what it computes for every other initial configuration. These triv- 
ial elementary CA rules are automatically ruled out, particularly the 
most simple among them that cannot usually be ruled out as candi- 
dates for universal behaviour given that even if they look trivial for 
the simplest or for certain initial configurations, they could still be 
capable of the necessary versatility and eventually be programmed in 
light of the space of all possible inputs for which they may be sensitive. 
The foundations of conjecture 1 and the conjecture itself are consistent 
with all these observations, but it is most meaningful for systems that 
are believed to be of great complexity but are usually not believed to 
be malleable enough to be programmed as universal systems, such as 
is the case with rule 30. If the conjecture is true, C(U) may not only 
rule out systems which intuition strongly suggests are unable to behave 
as universals, but it would also indicate that random-looking systems 
such as rule 30 are not capable of universal computation because they 
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are incapable of carrying information. In this sense, the measure may 
also be a characterisation of the practical randomness of a system in 
terms of efficient information transmission. 

Rule 110, however, has a positive C value, meaning it is efficient at 
carrying information from its input through the output, and that one 
can actually program it to perform computations. C is compatible with 
the fact that it has been proven that rule 110 is capable of universal 
computation. 
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Figure 2. It is an open question whether ECA rule 30 can be programmed to 
perform computations. Its C value is low, meaning that it is not efficient for 
transferring information because it always behaves in the same fashion-too 
randomly. 

A universal computer (would therefore have a non-zero C limit value. 
C also captures some of the universal computational efficiency of the 
computer in that it has the advantage of capturing not only whether it 
is capable of reacting to the input and transferring information through 
its evolution, but also the rate at which it does so. So C is an index 
of both capability in principle and ability in practice. A non-zero C 
means that there is a way to codify a program to make the system 
behave (efficiently) in one fashion or another, i.e. to be programmable. 
Something that is not programmable cannot therefore be taken to be 
a computer. 

In [14], Margolus asserts that reversible cellular automata (RCA) 
can actually be used as computer models embodying discrete analogues 
of classical notions in physics such as space, time, locality and micro- 
scopic reversibility. He suggests that one way to show that a given rule 
can exhibit complicated behaviour (and eventually universality) is to 
show (as has been done with the Game of Life [9] and rule 110 [5, 18]) 
that "in the corresponding 'world' it is possible to have computers" 
starting these automata with the appropriate initial states, with digits 
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acting as signals moving about and interacting with each other to, for 
example, implement a logical gate for digital computation. 

Conjecture 1 also seems to be in agreement with Wolfram's beliefs 
concerning rule 30, which according to his Principle of Computational 
Equivalence (PCE) [18] may be computationally universal and still 
be impossible to control so as to be able to perform a computation 
(something that Wolfram has himself suggested [18]). 

RCA are interesting because they allow information to propagate, 
and in some sense they can be thought of as perfect computers-indeed 
in the sense that matters to us. If one starts an RCA from a non- 
uniformly random initial state, the RCA evolves, but because it cannot 
get simpler than its initial condition (for the same reason given for the 
random state) it can only get more complicated, producing a compu- 
tational history that is reversible and can only lead to an increase in 
entropy. 

I 3. On the possible computational power of Busy Beaver machines 



| 3.1 Busy Beaver machines 

Rado also [16] studies the behaviour of a special kind of one-tape n- 
state deterministic Turing machine, one that starts with a blank tape, 
writes more non-blank symbols than any other n-state Turing machine, 
and halts. 

Notation: We denote by (n, 2) the class (or space) of all n-state 2- 
symbol Turing machines (with the halting state not included among 
the n states). 

Definition 5. [16] If or is the number of Is on the tape of a Turing 
machine T upon halting, then: XX n ) = max {err : T G (n,2) T(n) halts}. 
If tx is the number of steps that a machine T takes upon halting, then 
S(n) = max{*r : T G (n, 2) T(n) halts}. 

XX n ) an d S(n) as defined in 1 and 2 are noncomputable functions 
by reduction to the halting problem. Yet values are known for (n, 2) 
with n < 4. 

The Busy Beaver problem lies at the heart of what may be seen 
as a paradox, for while a Busy Beaver machine of n states can be 
thought of as having maximal sophistication vis-a-vis all n state Turing 
machines as regards the number of steps and printed symbols, Busy 
Beaver machines can be extremely easily defined. The definition of 
Busy Beaver machines describes an infinite set of Turing machines 
characterised by a particular behaviour-the attribute of printing more 
non-blank symbols on the tape before halting, or having the longest 
runtime among all Turing machines of the same size (number of states). 

Bennett's logical depth measure is relevant in characterising the 
complexity of an n-state Busy Beaver machine both in terms of size 
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(fixed among all n-state machines) and in terms of the behaviour that 
characterises this type of machine, because it follows from Rado's def- 
initions and Bennett's concept of logical depth that Busy Beavers are 
the deepest machines provided that they are the ones with the longest 
history producing a string. 

Yet a Busy Beaver is required to halt. When running for the longest 
time or writing the largest number of non-blank symbols, bb(n) has 
to be clever enough to make wise use of its resources and still save a 
rule to halt. These facts may suggest the following conjectures, also 
in connection with the dynamic behaviour of a set of simply described 
machines with universal behaviour. 

Conjecture 2: 

1. (strong version): For all n > 2, bfe(n) is capable of universal computa- 
tion. 

2. (sparse version): For some n, bb(n) is capable of universal computation. 

3. (weak version): For all n > 2, bb(n) is capable of (weak) universal 
computation. 

4. (weakest version): For some n, bb(n) is capable of (weak) universal 
computation. 

It is known that among all 2-state 2-symbol Turing machines none 
can be universal. Remember, however, that bb(n) as defined by Rado 
[16], is a Turing machine with n states plus a special halting state. 
So bb{2) is actually a 3-state 2-symbol machine in which one state is 
specially reserved for halting only. By letting bb(n) be a weak universal 
machine, one allows initial tape configurations other than those filled 
with just a single symbol (usually called a blank tape, but blankness 
is a symbol in itself), but with initial configurations simple enough so 
that one can guarantee that the computation is not performed before 
it is given already computed in the input encoding. In other words, 
bb{n) is allowed (in the conjecture versions 2.3 and 2.4) to start either 
from a periodic tape configuration or an infinite sequence of the type 
accepted by a regular w-automaton [19]. 

I 3.2 Discussion of the characterisation 

If any version of the conjectures excepting conjecture 2.4 is true, the 
characterisation would define a countable infinite set of universal Tur- 
ing machines. Their proof may provide an interesting framework and 
a possible path to take for proving a whole set of Turing machines 
to be capable of universal computation on the basis of their common 
dynamical properties. 

Because halting machines that always halt cannot be capable of 
unbounded computation, and therefore of universal Turing behaviour, 
among the analytical tools necessary to demonstrate the universality 
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of (any) of these systems are proofs that Busy Beavers are capable of 
avoiding the halting state. If one proves that Busy Beavers always halt, 
that would amount to proving that they cannot be universal. But to 
disprove conjectures 2.1 to 2.3 one can simply prove that at least one 
Busy Beaver is not capable of a halting configuration, and a study of 
this type is likely to be simplified for bb(3) or bb(4), for which Busy 
Beaver functions are known and are Turing machines small enough to 
be subjected to a thorough and potentially fruitful investigation in this 
regard. The investigation of the behaviour of Busy Beaver machines 
for other than blank tape initial configurations indicates that these ma- 
chines are capable of non-trivial behaviour for other than the simplest 
initial configuration (as intuition would suggest, given that if they be- 
have in a sophisticated fashion for the simplest initial condition, they 
may be expected to continue doing so for more complicated ones). In a 
future paper we will explore the specific behaviour of these machines. 

The truth of the conjectures may not seem intuitively evident to 
all researchers, given that it is possible that these machines are only 
concerned with producing the largest numbers by using all resources 
at hand, regardless of whether they do so intelligently. However, the 
requirement to halt is, from our point of view, a suggestion that the 
machine has to use its resources intelligently enough in order to keep 
doing its job while saving a special configuration for the halting state. 

Despite the conclusion that conjecture 2.4 would imply, namely that 
the property of being a Busy Beaver machine is not a characterisation 
of the computational power of this easily describable set of countable 
infinite machines, among the intuitions suggesting the truth of one 
of these conjectures is that it is easier to find a machine capable of 
halting and performing unbounded computations for a Turing machine 
if the machine already halts after performing a sophisticated calculation 
than it is to find a machine showing sophisticated behaviour whose 
previous characteristic was simply to halt. This claim can actually be 
quantified, given that the number of Turing machines that halt after 
t = n for increasing values of n decreases exponentially [7, ?]. In other 
words, if a machine capable of halting is chosen by chance, there is an 
exponentially increasing chance of finding that it will halt sooner rather 
than later, meaning that most of these machines will behave trivially 
because they won't have enough time to do anything interesting before 
halting. 

We have no positive proof of any version of these conjectures and 
much more work remains to be done on the dynamical behaviour of 
these systems. But conjectures 1 and 2 lead us to: 

Conjecture 3: C(bb(n)) > 0. 
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| 4, Concluding remarks 

The first conjecture relates computational universality to the capacity 
of a computational system to transfer information from the input to 
the output and reflect the changes in the evolution of the system when 
starting out from different initial configurations. We established that 
the property of having a large phase transition coefficient seems nec- 
essary. On the other hand, a universal system seems to be capable of 
manifesting an abundance of possible evolutions and reacting to differ- 
ent initial configurations in order to (efficiently) behave universally. 

A second conjecture concerning the possible universality of a kind 
of well-defined infinite set of abstract Busy Beaver Turing machines 
was introduced-also in terms of a version of a measure of complexity 
related to algorithmic complexity and the dynamic behaviour of these 
machines having a particular common characterisation. The third con- 
jecture relates conjectures 1 and 2. 

These conjectures will be the subject of further study in a paper 
to follow this one. We would like to see the conjectures proved or 
disproved, but underlying the conjectures are many other interesting 
questions relating to the size, behaviour and complexity of computing 
machines. It would be interesting, for example, to find out whether 
there is a polynomial (or exponential) trade-off between program size 
and the concept of simulating a process. 
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